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ABSTRACT 

This study was designed to gather data on the meaning 
of imprecise terms from items written by physicians for their 
students and by test committees for national licensure and 
certification examinations. A total of 32 members of test committees 
who write examination items for various medical specialty 
examinations participated in the study. Each participant was provided 
with a list of phrases used in multiple-choice items to express some 
concept related to frequency of occurrence. Participants were asked 
to indicate what percentage of time was reflected by each phrase and 
to express this percentage both as a single number (i.e., 75% of the 
time) and as a band (i.e., 60 to 80% of the time). The list included 
15 terms. Responses of two participants were deleted due to excessive 
aberrance. Results indicate that the phrases used by item writers to 
express frequency do not have an operational definition that is 
commonly shared. Both the single values and phrases varied 
considerably across individuals. Problems created by vague terms are 
much more severe for true/false type items than for one-best answer 
items. These findings support the contention that general guidelines 
on item writing would be beneficial to the medical community. One 
table and five figures are provided. (TJH) 
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How Often is "Often"? 
the Use of Imprecise Terms in Exam Items 



Susan M. Case, Ph.D. 
Senior Evaluation Officer 
National Board of Medical Examiners 



Introduction 

Textbooks on item writing indicate that terms such as often, usually, 
and frequently should be avoided in multiple choice questions. 
Physicians who write exam items for their students and those who serve on 
test committees for national licensure and certification examinations are 
generally unfamiliar with the educational literature and use imprecise 
terms such as these with great regularity. Item writing workshops that 
focus on quoting the experts from the educational arena have little 
impact in changing the behavior, at least in part because physicians 
believe the terms do have a common definition among practitioners. The 
language that they use in their items reflects language used in medical 
discussions (eg, "Obesity is frequently associated with hypertension"). 

The purpose of this study was to gather data on the meaning of imprecise 
terms from the item writers themselves. If consensus about the meaning 
of these terms was found among physicians, then the general guidelines 
related to item writing would not apply to the specific area of medical 
education and evaluation. If, on the other hand, little consensus on the 
meaning of the terms was found, then the data could be used as an 
illustration to support the contention that the guidelines are applicable 
to the medical community. 

Method 

A total of 32 members of test committees who write examination items for 
various medical specialty examinations participated in the study. Each 
participant was presented with a list of phrases used in multiple choice 
questions to express some concept related to frequency of occurence. They 
were asked to indicate what percentage of time was reflected by each 
phrase and to express this percentage both as a single number (eg 75X of 
the time) and as a band (eg, 60 to 80% of the time). Fifteen terms were 
included on the list. The responses of two individuals were deleted from 
analysis because they were so aberrant. 
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Results 

Table 1 shows the mean value and standard deviation of the single number 
associated with each term on the list. The means ranged from a high of 
84% of the time for the phrase "most of the time" to a low of 5% of the 
time for the phrase "almost never". There was considerable variation in 
the value listed for each term; standard deviations ranged from a low of 
5.5 to a high of 21.7. The values listed for over half of the phrases 
spanned 50 percentage points. 

Figures 1 and 2 show the distribution of responses for two of the 
phrases, "mcst of the time" and "often", selected to serve as 
illustrations of the data. A total of 40X of the respondents defined 
"most of the time" as 90X of the time while 30X defined it as 80X of the 
time. Despite this difference of opinion, this is more concensus than 
was found in most of the other phrases. Overall, the distribution of 
values used to define the phrase ranged from a low of 60X of the time to 
a high of 99X of the time. The mean was 8AX. 

In contrast, approximately 20X of the respondents defined "often" as 60X 
of the time; an additional 20X defined it as 70X of the time. More than 
half of the respondents listed other values, ranging from a low of 20X to 
a high of 90X, with a mean of 60X. 

Figure 3 shows box-plots associated with each term. The mean is the 
center of the box; the top and bottom of the box are 1 SD from the mean. 
The lines are extended to the lowest and highest value listed. 

The overlap between the terms was considerable. Terms with a broad range 
of values included "commonly" with a range from 20 to 90; "frequently" 
with a range from 20 to 95; and "likely to occur" with a range from 25 to 
95. The phrase with the most concensus was "approximately half" which 
was defined as SOX of the time by 87X of the respondents. 

There was little concensus regarding the bands for the terms. Figures 4 
and 5 show the bands used by respondents to define the terms used for 
illustration above, "most of the time" and "often". The phrase "most of 
the time" showed more consensus than most of the other phrases. The band 
used most often to define the phrase was 70-90X of the time (20X of the 
respondents). Three other bands were listed by 13X of the respondents: 
75-99X, 80-95X, 85-95X. 

Only one of the bands for the term "often" was listed by as many as 13X of 
the respondents: 60-81X. Only two additional bands was listed by more 
than one person: 70-80X and 50-70X. 
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Discussion 



The results indicate that the phrases used by item writers to express 
frequency do not have an operational definition that is commonly shared. 
The mean value plus or minus one standard deviation typically exceeded 25 
percentage points. 

Not only did the single values vary considerably among individuals, but 
the bands assigned to the phrases varied considerably as well. Phrases 
that had precise definitions (narrow bands) for one individual were not 
necessarily those that had precise definitions for the others. In 
addition, for some of the phrases (eg "often"), phrases were defined 
rather precisely, but there was little overlap among individuals in the 
values included in the band. 

It should be noted that imprecise terms are used in our everyday speech 
and in our writing. Many of the phrases used in the study are also used 
in the text of this paper, without (it is hoped) creating any confusion. 
However, imprecise terms may cause confusion when they are used in the 
text of examination items. 

These results have different implications for the various item formats. 
Problems created by vague terms are much more severe for true -false type 
items (K- , C- and X-type items) than for one-best answer (A- and B-type) 
items . 

For example, imprecise terms cause major problems in multiple true- false 
items such as that listed below: 

True statements about cystic fibrosis (CF) include: 

1. CF is associated with short stature. 

2. Children with CF usually die in their teens. 

3. Males with CF are often sterile. 

4. CF primarily affects the lungs. 

Modifying the item by specifying exact numbers doesn't correct the 
problem. For example, the statement "the incidence in the US is 1:2000" 
can not be judged as true or false. Making it more vague by stating "the 
incidence in the US is approximately 1:2000" does not help since the band 
is not specified. In true/false items, the appropriate treatment of 
numerical items is either to generate a comparison (eg, the incidence of 
CF is greater than that of juvenile diabetes) or to specify a range (eg, 
the incidence of CF is greater than 1:1500). 

The problem noted above with multiple true -false items is not as acute 
with well-constructed "one-best answer" items (ie, those that pose a 
clear question). For example, the following has a vague term in the 
stem, yet because the task is to select the one -best answer, the item is 
relatively unambiguous. 



Children born with CF are most likely to die 

A. before the age of 1 

B. between the ages of 1 and 5 

C. between the ages of 5 and 10 

D. when they are teenagers 

E. when they are in their 20s 



Problems do arise with one-best answer items like the following: 

Children with CF have problems with their digestive systems 

A. frequently 

B. usually 

C. often 

D. most of the time 

The only way to make such an item worse is to use a fifth option "none of 
the above " . 
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TABLE 1 



Summary Data 
for Frequency Terras 



r 



Terms 


Mean 


SD 


Most of the time 


84. 1 


7.5 


Most likely to occur 


78.3 


12.5 


Primarily 


77.9 


13.1 


Most often 


72.7 


15.0 


Usually 


71.9 


10.4 


Likely to occur 


65.8 


16.1 


Probably 


64.9 


11.5 


Commonly 


62.6 


16.2 






1 Q 9 


Often associated with 


61.0 


15.7 


Often 


60.5 


18.4 


Associated with 


59.5 


21.7 


Approximately half 


51.9 


8.6 


Rarely 


8.2 


5.5 


Almost never 


5.1 


5.7 
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Figure 1 



Distribution of the responses 
defining the phrase "Most of the time" 
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Kurtosis 
S £ Skew 
Maximum 



Mean 
Mode 



84.133 
90 . 000 
2.426 



Std Err 
Std Dev 
S E Kurt 
Range 



1.366 
7.482 
.833 
39.000 



Median 
Variance 
Skewness 
Mininum 



85.000 
55.982 
-.994 
60.000 



.427 
99.000 



Sum 



2524.000 
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Figure 2 
Distribution of the responses 



defining the phrase "Often" 



20 XXXXXX 1 

25 XXXXXX 1 

30 XXXXXXXXXXX 2 

40 XXXXXXXXXXX 2 
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r 



Mean 
Mode 



60.500 
60.000 
-.218 



Std Err 
Std Dev 
S E Kurt 
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3.359 
18.399 



Median 
Variance 
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Minimum 



60.000 
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Kurtosis 
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Figure 3* Box plots showing distribution of responses for frequency tents 



Figure A 
Bands Listed for 
"Most of the Time" 
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Figure 5 

Bands Listed for 
"Often" 



100 
95 
90 
85 
80 
75 
70 
65 
60 
55 
50 
45 
40 
35 
30 
25 
20 
15 
10 
5 



Band 


n 


Band 


10-30 




50-99 


20-40 




55-65 


25-35 




60-80 


25-99- 




60-81 


30-50 




60-90 


30-60 




65-75 


35-65 




65-80 


40-60 




70-80 


40-75 




70-89 


50-70 


2 


70-90 


50-75 


1 


75-99 


50-80 


1 


80-00 






85-99 



9 

ERJC 



.13 



